pp002 科研论文阅读笔记

Edited by Ben. Get the knowledge flowing and circulating! :)

pp002 科研论文阅读笔记问题引入具体内容暂时还不懂的知识点引用内容引申论文好词佳句

今天读到一篇自己非常感兴趣的论文¹。

问题引入

existing method to computing the trajectory similarity prioritize spatial similarity over temporal similarity. 这就造成了time-aware analysis要弱很多！

文章考虑了轨迹之间的fine-grained spatial and temporal relations，实现了路网中轨迹之间的时空相似性计算。

具体内容

对于时空建模而言，需要encode轨迹的时间、空间信息。这里首次提出了一个generic temporal modeling module.

spatio-temporal co-attention fusion: 设计了两个fusion策略，用来生成轨迹的 unified spatio-tempral embeddings.

暂时还不懂的知识点

triplet loss
curriculum learning

这个知识点在原文中的描述：

Further, under the guidance of triplet loss, ST2Vec employs curriculum learning in model optimization to improve convergence and effectiveness.

this paper adopt an orthogonal but complementary approach to exsiting space-driven similarity learning studies.

引用内容

这篇论文中，不方便记录的的内容

提及了轨迹的形式化表示方法
轨迹相似性计算的概念
传统的两个轨迹之间相似性计算的方法
- free-space measures such as DTW, LCSS, Hausdorff, ERP
- road network measures such as TP, DITA, LCRS, and NetERP
  这些方法的待改进地方：
  - incur high computation cost
  - 原因：rely on pointwise matching, incurring quadratic level 计算复杂度
- 为了解决这个问题：neural network based model → learn 轨迹的 deep representations
  - so that 轨迹之间的相似性关系可以preserved in the 低维embeddings中
  - 这个效果已经被证明比直接在原始轨迹上操作的技术速度更快

虽然，上述工作提高了efficiency，但是尚存很多限制。

全部都disregard时空轨迹中的时间维度。
- That is, 他们在学习embeddings时，只考虑了spatial dimensions of trajectories. 所以，他们在那些temporal aspect非常重要的场景中就显得ineffective.
- 总的来说，同时考虑spatial and temporal similarity 在time-aware applications中是非常重要的。
  - 比如：transportation planning
  - 再比：monitoring

现存的研究提供了guidance for spatial similarity modeling, three non-trivial challenges remain.

temporal similarity learning
spatio-temporal fusion
model optimization

具体挑战：

C1: 如何在temporal similarity learning中捕捉轨迹之间的temporal correlations?

核心task就是：生成 preserve 轨迹之间 temporal similarity relations 的 time-oriented trajectory embeddings
- 这个问题的自然想法就是把轨迹中包含的time sequences喂给RNN models来捕捉序列信息。这个有点相似于现存的spatial similartiy learning, 它们一般是把spatial sequences喂给RNNs
- 但是：temporal modeling 更具挑战。
  - 因为轨迹的空间位置是离散的，所以在evaluation时，可以利用具体的度量方式评估空间距离；
  - 而轨迹的时间信息exhibits continuous and periodic patterns.
    1. 首先，时间从不停止，很难找到一个满意的granularity去discretize这个时间；
    2. 其次，轨迹展示出了很强的periodicity, resulting in seconds, hours, days, etc.
    所以，时间的表示必须是invariant to time rescaling.
- 总结来说，就是直接把时间信息喂给RNNs来学习temporal dimensional embedding不太靠谱，因为 it does not contend with temporal charactersistics.
- 文章通过twice attempts with non-trivial efforts (Section4.2) 最终设计了一个TMM module，实现了effective时间轨迹representation learning.
- 重点强调：值得注意的是，TMM is flexible and genric, 所以它可以被集成到任何现有的空间轨迹相似性learning proposals 来compute spatio-temporally aware 轨迹相似性。

C2: 如何fuse时空特征来生成unified trajectory embeddings，从而用于spatio-temporal similarity learning?

一旦时空特征被捕捉了，我们应该去fuse这些特征去获得一个unified spatio-temporal trajectory embeddings。
- 但是，对于空间相似性和时间相似性而言，不同的application可能分配不同的权重
  - 例如：region function estimation 更侧重于spatial方面，所以会对空间相似性分配更高的权重
  - 再比：ridesharing可能更侧重于temporal方面，所以对时间相似性会分配更高的权重
- 所以，overall, 一种preferable的fusion方法必须考虑不同的时空权重，不会损伤模型的收敛性，特别是当时间和空间维度are considered jointly.
- 文章提出STCF这的一个fusion model来生成一个unified embeddings.

C3: 如何optimize the learning of embeddings 来提高effectiveness and efficency?

the learning of embeddings的两个主要目标是：effectiveness & efficiency
- effectiveness：quality of embeddings
- efficiency: model convergence
影响模型的性能的指标太多了，比如：the training samples, learning procedure, network parameters.
- 为了提高effectiveness，文章采用了triplet loss，然后用curriculum learning来训练model.
- 为了避免参数的excess & 改进efficiency，文章在co-attention fusion module中提出了2种不同的fusion方法

综上，为了解决上面的3个挑战，论文提出了一个representation learning architecture, 称为ST2Vec.

主要贡献

第一份工作：在time-aware trajectory representation和轨迹相似性学习的时空混合。时空权重 & 一系列的经典measures.
经过在时间建模上的twice attempts，提出了一种generic temporal represntation learning module.
Further, 利用2种不同的fusion策略(SF, UF)来整合轨迹的时空特征，提出了一种spatio-temporal co-attention fusion module
为什么要强调twice attempts呢？
就learning-based轨迹相似性计算的任务，第一份把triplet和curriculum concept引入指导学习过程的工作，进一步提高了模型的accuracy 和 convergence
conduct experiments on 4个 popular network-aware trajectory measures, 就effectiveness、efficiency、以及low parameter sensitivity方面表示出了很好的性能。一个ablation study验证了关键决策的efficacy。此外，两个真是的case studies证明了ST2Vec在downstream上的capability。

一些频繁出现的概念

Non-learning-based methods | Learning-based methods

Non-learning-based methods: 依赖于well-defined similarity measures & acceleration techniques
- 这里，例如：network-aware similarity measures | free-space based measures
  - network-aware similarity computation techniques的工作方法
    1. 首先：map trajectories to road-network paths that consist of vertices or segments
    2. 然后：they define similarity measures based on classic distance measures such as Hausdorff [1], DTW [34], LCSS [27], and ERP [5], generally by aggregating the distances between road vertices or segments of two trajectories
    3. 例如：NetERP, 在两条轨迹的各个顶点之间aggregating shortest-path distances
    4. 再比：LCSS+, 提出了Longest Overlapping Road Segments用于相似度计算
    5. 再比：有人提出了direction-aware Longest Common Road Segments (LCRS)
    这些方法的计算消耗都非常高！
Learning-based methods: 越来越流行的一种方法，因为它们利用了advances in deep learning tech. 例如：their increasingly powerful approximation capabilities.
- The learning-based methods learn distance functions that embed input trajectories and approximate given distance measures.
  这里的embed好像时态错了。
  基于学习的方法，学习距离函数。基于学习的方法学习嵌入输入轨迹和近似给定距离度量的距离函数。
  这样的好处是什么：trajectory embeddings are generated that enable fast trajectory similarity computation and downstream analyses
- t2vec: 考虑了低采样率和噪声点的影响for deep representation learning. not similarity metric learning. （原因：因为研究的问题不一样）
- NEUTRAJ: employs metric learning to approximate 轨迹相似性 for different free-space based distance measures.
- Traj2SimVec: 在学习的过程中考虑了子轨迹相似性。
- T3S: 利用注意力机制提高了performance。
- 还有些工作focus on semantic-aware trajectory similarity learning
然鹅：While these studies all make advances, they all target spatial trajectory similarity in free space.
- GTS: targets trajectory similarity learning, which is designed for POI-based spatial trajectory similarity computation. 结果，GTS把共享相同或者相临近的POIs的路径（尽管完全不同的路径）认定为相似路径。GTS learns a single type of distance measure (i.e., TP [21], an extension to the Hausdorff distance)
ST2Vec accommodates a range of popular measures，并且考虑了时间因素。（时间因素 on par with 空间因素）

引申论文

[7]

triplet loss[19]

好词佳句

不要高精尖、不要信达雅！只要有效传输咱们的想法！

Specifically, xxx. 一般用在进一步解释说明的时候。

In particular, xxx. 特别地，需要进一步强调的时候使用。

欢迎大家点开「这篇论文我读过」系列。本系列笔记是我在科研学习过程中，非常非常重要组成部分之一！
该系列的笔记主要包括：
论文中好的想法的摘录；
论文中的工作引发我的思考；
论文中好的词和句子；
以及其他可以吸收的有价值 | 有意义的内容。

1 Fang Z, Du Y, Zhu X, et al. Spatio-temporal trajectory similarity learning in road networks[C]. Proceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining, 2022: 347-356. ↩